Navigation

[{"content":"for those who don\u0026rsquo;t know, invoker is a character in game called dota 2, and playing this hero is like pushing mental processing and mechanical execution to their limits, regardless of how veteran you are in gaming, once you give it a try, the information overload is real and actual experience matters here.\nlet me start with the cold, hard data that makes invoker unique among mobas. each spell requires exactly 4 keystrokes minimum - three orb combinations (q/w/e) plus the invoke key (x). with 10 available spells, that\u0026rsquo;s 40 keystrokes just to cast every spell once. but here\u0026rsquo;s where it gets interesting: in actual gameplay, you\u0026rsquo;re not casting spells in isolation and obviously not random.\nto mathematically express how invoker spells are computed here: $$ \\text{number of combinations} = \\binom{n + r - 1}{r} $$\nwhere:\n\\begin{flalign} \\text{n = 3 } (\\text{types of orbs: q, w, e})\u0026amp;\u0026amp; \\end{flalign}\n\\begin{flalign} \\text{r = 3 } (\\text{3 orbs per spell})\u0026amp;\u0026amp; \\end{flalign}\n$$ \\binom{3 + 3 - 1}{3} = \\binom{5}{3} = 10 $$\nfor starters, it requires understanding of game and the hero itself, and obviously being comfortable with the execution, like pressing keys in keyboard as quickly as possible. and beyond raw mechanical execution, invoker presents a unique challenges, where we talking about:\nspell patterns spell cooldowns spell usage (when to use what) resource optimization positional awareness obviously i am no expert so far yet, hence, i am planning to document it here and might update it sooner how it goes, meanwhile, here is a small gist of how it goes in this video\nyour browser does not support the video tag. ","date":"2025-07-30","permalink":"https://shashanksharma.xyz/microblog/personal-invoker-experience/","summary":"For those who don\u0026rsquo;t know, invoker is a character in game called Dota 2, and playing this hero is like pushing mental processing and mechanical execution to their limits, regardless of how veteran you are in gaming, once you give it a try, the information overload is real and actual experience matters here.\nLet me start with the cold, hard data that makes Invoker unique among MOBAs. Each spell requires exactly 4 keystrokes minimum - three orb combinations (Q/W/E) plus the Invoke key (X).","title":"personal invoker experience at dota 2"},] [{"content":" introduction you might have seen problem statements where the user is expected to execute sql queries based off problem statements and check whether the query output is correct or not. during my work at hackerrank, one of the use case which i was assigned was to figure out, why database query execution takes more time and how we can further improve it.\nnow obviously, whenever user input execution comes into picture, it\u0026rsquo;s better to take precaution and containers are usually the best tools for this job and this brings many complications. this blog details my journey through diagnosing, replicating, and solving this problem.\nthe mystery begins: increasing memory usage initially, my task was nowhere near this, i was planning to improve the overall execution with caching and better checkpointing of the container, but turns out, in general, execution time had two parts for it:\ntime it took to execute the query in the container time it took to execute the query while container is being brought up now, the second problem statement is out of the scope of this blog; hence, we\u0026rsquo;ll assume that the containers are readily available. it all started with an investigation, understanding what was happening with existing containers, and i noticed a concerning pattern in our monitoring dashboard. some of our pods were showing steadily increasing memory usage over time\nobviously, it won\u0026rsquo;t increase without any reason, and the pattern was clear: memory usage would gradually climb higher with increased query frequency. this was particularly concerning because our service is designed to handle sustained high loads, and memory leaks could eventually lead to service degradation or outages.\nthe smoking gun: 8088 zombies while investigating a production container, i ran a standard top command and was shocked by what i saw:\nwhat are zombie processes anyway? a zombie process is a terminated process that has completed execution but still has an entry in the process table because its parent hasn't read its exit status. while zombies release their user-space memory (which is why they show 0 memory usage in ps), they still consume system resources in other ways.\n8088 zombie processes! this was definitely not normal, and immediately rang alarm bells. and after confirming with other containers, we saw the correlation that the longer the container is running and active, more zombies were created, which means these zombies were created per request being processed.\nconnecting to container runtime what is runc? runc is a lightweight, portable container runtime that implements the open container initiative (oci) specification. it's the low-level tool that actually creates and runs containers - the same technology powering docker, kubernetes, and most modern container platforms.\nobviously, my initial instinct was to make it reproducible locally so that it can be resolved quickly, and once i did local setup with test data, i needed to figure out who is exactly creating these zombie processes and turns out, it was runc itself, and how we were maintaining container lifecycle.\nnow searching google didn\u0026rsquo;t have much info around it but after a few nights, and digging through multiple documentations, i found this in terminals.md1:\nusing detached mode is a bit more complicated and requires more care than the foreground mode \u0026ndash; mainly because it is now up to the caller to handle the stdio of the container. another complication is that the parent process is responsible for acting as the subreaper for the container. in short, you need to call prctl(pr_set_child_subreaper, 1, \u0026hellip;) in the parent process and correctly handle the implications of being a subreaper. failing to do so may result in zombie processes being accumulated on your host.\nthis was precisely our situation!\nwhat\u0026rsquo;s really happening? the full picture when a container is launched in detached mode, the container runtime creates processes and then exits. these processes may spawn additional child processes, creating a hierarchy. when processes within this hierarchy terminate, their children become \u0026ldquo;orphaned.\u0026rdquo;\nin standard linux behaviour, orphaned processes are \u0026ldquo;reparented\u0026rdquo; to the init process (pid 1), which periodically reaps zombie processes. however, in containerized environments, this reparenting behaviour can be disrupted, leading to zombie accumulation.\neach zombie process:\nthe process table each entry in process table consumes kernel memory and contains metadata like process state, credentials, and resource usage. the table has a finite size (typically 32,768-65,536 entries), making it a critical resource that zombie processes can exhaust.\noccupies a process table entry - this is the most critical constraint as systems have a maximum number of processes (typically 32,768-65,536) consumes kernel memory - not visible in user-space tools like ps or top retains parent process resources - the parent must maintain references for each zombie child may hold on to file descriptors - if not properly closed before termination the solution: implementing a subreaper the fix was actually basic. i implemented a subreaper in our container management code:\nif err := unix.prctl(unix.pr_set_child_subreaper, 1, 0, 0, 0); err != nil { return fmt.errorf(\u0026#34;error setting subreaper: %+v\u0026#34;, err) } and something like this (either have a gorouting, or reap it during container termination):\ngo func() { for { var wstatus unix.waitstatus pid, err := unix.wait4(-1, \u0026amp;wstatus, unix.wnohang, nil) if err != nil \u0026amp;\u0026amp; err != unix.echild { log.witherror(err).warn(\u0026#34;error in reaper goroutine\u0026#34;) } if pid \u0026gt; 0 { log.debugf(\u0026#34;reaped zombie process with pid %d\u0026#34;, pid) } time.sleep(100 * time.millisecond) } }() this two-part solution ensures:\nprocess capturing: the pr_set_child_subreaper flag ensures all orphaned processes in the container hierarchy are reparented to your process instead of init.\nactive reaping: the reaper goroutine actively collects the exit status of terminated child processes using non-blocking wait4() calls.\nunderstanding the code snippet to fully understand these results, we need to explore the linux subreaper mechanism in depth.\nthe pr_set_child_subreaper flag was introduced in linux kernel 3.4 to address limitations in process management for container-like environments. the reference can be seen in arch linux man page as well.2\nthis mechanism was specifically designed for container orchestration to ensure proper cleanup of containerized processes. however, as our experiment showed, setting the flag is only half the solution. the process must still actively reap its adopted children using wait() or similar calls.\nlet\u0026rsquo;s look at a couple of scenario based off what we discussed, assuming we are creating 20 containers and destorying it\n1. no fix scenario (20 zombies) without any zombie prevention, only the direct child processes (the container processes launched by our test program) become zombies. any processes spawned within the containers get reparented to the system\u0026rsquo;s init process, which may or may not automatically reaps them based off how complex the logic might be.\n2. subreaper only scenario (118 zombies) when setting the subreaper flag without implementing a reaper loop, the parent process becomes responsible for all orphaned processes in the container hierarchy, but never reaps them:\nthe increased zombie count demonstrates that containers create complex process hierarchies that would normally be reaped by init, but are now \u0026ldquo;captured\u0026rdquo; by the subreaper without being cleaned up.\n3. reaper loop only (4 zombies) using only a reaper loop without the subreaper flag works reasonably well for simple process hierarchies but misses some processes in complex scenarios:\nthe reaper loop can only reap direct children, but init still handles the orphaned processes.\n4. complete solution (0 zombies) using both mechanisms together provides complete zombie prevention:\nall processes in the hierarchy become the responsibility of the parent process, which actively reaps them.\nminimal reproducible code to validate that zombie processes were causing our memory issues, i designed a controlled experiment. i created a test application that:\ncreated containers in detached mode similar to our production environment optionally enabled the subreaper fix tracked memory usage and zombie count over time code can be found here:\nand as per result:\ntime (min) zombies memory (mb) container % change (memory) 0:0 10 1992 10 - 60:00 4000 2100 10 +5.42% 330:24 19851 2236 10 +6.48% and the worst part is, as soon as i hit the upper limit of zombie processes, after about 5.5 hours, the system hit a critical resource limit and could no longer create new processes with an error:\nfailed to create container zombie-test-20251-0: fork/exec /usr/bin/sudo: resource temporarily unavailable\nproduction impact after implementing both components in the service, we observed immediate improvements:\nzombie processes no longer accumulated memory usage stabilized system stability improved dramatically the before/after comparison shows clear improvement:\nobservation kib mem total kib mem free kib mem used kib mem buff/cache before change (1) 2032964 641188 249340 1142436 before change (2) 2032964 625512 277956 ↑ 1129496 before change (3) 2032964 626244 303504 ↑ 1103216 before change (4) 2032964 618552 329220 ↑ 1085192 after change (1) 2032964 612948 291732 1128284 after change (2) 2032964 616888 293164 1122912 after change (3) 2032964 623816 294056 1115092 after change (4) 2032964 619660 295016 1118288 here you can see memory usage has been increased (↑) before change, but after change, it\u0026rsquo;s being consistent, and i was able to confirm via memory usage in dashboard as well.\nreal-life example: comparison with other container runtimes this issue is not unique to runc. other container runtimes have addressed the zombie process challenge in somewhat similar ways:\ncontainerd: utilizes a shim process that acts as a subreaper. by setting the pr_set_child_subreaper flag, the shim ensures that orphaned processes are reparented to it and subsequently reaped.34 podman: uses conmon as a monitoring process to handle container lifecycles and reap zombie processes. conmon runs as a separate process and ensures that all child processes are properly managed.5 cri-o: cri-o\u0026rsquo;s integration with kubernetes allows it to manage container processes effectively, reaping zombies as part of its runtime responsibilities. 6 the consistency across implementations confirms that this two-part solution is the industry standard approach.\nconclusion this journey taught me that system resource management goes far beyond what\u0026rsquo;s visible in standard monitoring tools. zombie processes represent a fascinating edge case where processes that seem completely inactive (0 cpu, 0 memory) can still bring down a production system.\nby implementing proper process management with the subreaper pattern, we eliminated our memory leak issues and significantly improved the stability of our dbrank service. this experience does serve as a reminder that understanding operating system fundamentals is crucial for building production-grade systems.\nthis understanding aligns perfectly with the guidance in the runc documentation, but provides more in-depth insight into why both components are necessary.\nbonus this whole experiment, i was able to test it out thanks to my 8-year-old laptop with ubuntu os. reason being, runc is not supported for mac, and using linux vm to test out runc containers or worst, checkpointing/restore via criu is a nightmare, hence, my laptop was helpful\n23 mar 0.32h 24 mar 0.05h 27 mar 0.01h 28 mar 0.01h 29 mar 1.14h 30 mar 3.13h 31 mar 2.76h 01 apr 1.7h 02 apr 1.77h 03 apr 1h in total i spent approx 11.9 hours or more to write this blog post with 10 sessions.\nthank you for reading till the end :) runc documentation: https://github.com/opencontainers/runc/blob/master/docs/terminals.md\u0026#160;\u0026#x21a9;\u0026#xfe0e;\narchlinux reference: https://man.archlinux.org/man/pid_namespaces.7.en?utm_source=chatgpt.com#adoption_of_orphaned_children\u0026#160;\u0026#x21a9;\u0026#xfe0e;\ncontainerd subreaper code: https://github.com/containerd/containerd/blob/release/2.0/pkg/shim/shim.go#l244\u0026#160;\u0026#x21a9;\u0026#xfe0e;\ncontainerd reap code: https://github.com/containerd/containerd/blob/release/2.0/pkg/sys/reaper/reaper_unix.go#l264\u0026#160;\u0026#x21a9;\u0026#xfe0e;\npodman conmon: https://github.com/containers/conmon/blob/v2.1.13/src/utils.c#l88\u0026#160;\u0026#x21a9;\u0026#xfe0e;\ncri-o implementation: https://github.com/cri-o/cri-o/pull/272/files\u0026#160;\u0026#x21a9;\u0026#xfe0e;\n","date":"2025-04-03","permalink":"https://shashanksharma.xyz/posts/hidden-cost-of-container-zombies/","summary":"Introduction You might have seen problem statements where the user is expected to execute SQL queries based off problem statements and check whether the query output is correct or not. During my work at HackerRank, one of the use case which I was assigned was to figure out, why database query execution takes more time and how we can further improve it.\nNow obviously, whenever user input execution comes into picture, it\u0026rsquo;s better to take precaution and containers are usually the best tools for this job and this brings many complications.","title":"understanding and mitigating zombie processes in container runtimes"},] [{"content":"","date":"2025-03-22","permalink":"https://shashanksharma.xyz/notebooks/2024-review/","summary":"","title":"notebook: year in review 2024"},] [{"content":" introduction server status as of now all my server/services status can be seen at self hosted status page\nthis year has been exciting wrt me joining at coursera, learning java as a completely new languague, which i never thought i\u0026rsquo;ll be making use of given i have been priamrily working with python and golang. besides that, i did enjoyed picking new hobby, which in this case was is selfhosting. i finally made use of my raspberry pi and deployed couple of useful tools and taking first step towards homelab setup\nsleep schedule my sleeping schedule has been messed up when i was working at hackerrank, and i believe it peaked in coursera where i have been coding/doing focus time work at night and sleeping way too late. even though i was getting approx 8hrs of sleep, i was sacrificing alot wrt skipping breakfast, having random snacks at night etc, and as per my sleeping stats, you can easily see how bad it was:\nworkstation gear from amazon monitor: link\nmouse: link\nssd: link\nswitch: link\n15cm ethernet: link\nextension: link\nwindow mount: link\nobviously, joining a new company comes with good benifits, and in this case, i purchased moonlander which is super awesome btw, and troubled me a lot, because given it\u0026rsquo;s a columnar keyboard, i had to train my brain first, but once i got used to it, i can clearly see myself reaching 100 wpm without stressing my hand too much.\nhourly distribution of my time no meetings wednesday, in my team at coursera, wednesday is usually no meeting day, which means standup updates happen async and because of which if you notice, my working hours at 11:00 is less compared to other week days.\nmy usual day looks like waking up before 11:00 and making sure to be available for standup, hence according to this heatmap, my work usually starts by 11:00. also in this graph as well, you can see how messed up my sleeping schedule was.\nlaptop usage per day this year i have been very much active except few selected days, where i either had to attend some family function or i was out with friends. overall in total, i spent around 3926 hours which is excluding idle time.\nall my visualization can be seen at python notebook\n","date":"2025-03-22","permalink":"https://shashanksharma.xyz/posts/year-in-review-2024/","summary":"Introduction Server Status As of now all my server/services status can be seen at self hosted status page\nThis year has been exciting wrt me joining at Coursera, learning java as a completely new languague, which I never thought I\u0026rsquo;ll be making use of given I have been priamrily working with Python and Golang. Besides that, I did enjoyed picking new hobby, which in this case was is selfhosting. I finally made use of my raspberry pi and deployed couple of useful tools and taking first step towards homelab setup","title":"year in review 2024"},] [{"content":" introduction i have been playing games for a while, and it has been usually fps games, where it started from cs1.6 to csgo for more than 3,000 hours and playing valorant for 2 more years, and a couple of other ones, and at some point i realized i need to try some new style and started playing dota 2.\nmy first-ever game of dota 2 was at around may 2022, where i had no idea what i was doing and why i was doing it, those 5 games i remember closing them in between and had no idea what to build by looking at the items in the shop. for context, dota has around 125 heroes and understanding all the heroes takes significant amount of time, and even on top of it when i saw rubick hero, which steals other hero\u0026rsquo;s spells, i was so impressed by the game design and i devoted a major amount of time understanding and tried being good at it.\nthe journey through time source code all the visualization/animation is done via python, you can have a look at notebook with all the code present: dota2-player-analysis.ipynb\noverall winrate progression if you notice, initially, for a few 100 games, i struggled understanding the game because of which my net wins were in negative. but slowly, i started winning more than losses because of which win rate did increase and got almost linear thereafter.\nyour browser does not support the video tag. patch behaviour also, another quick observation, at some patches (blue lines), it requires players to complete objectives which makes them play into an uncomfortable position which leads to losing streak. for example, playing those heroes which they are not comfortable\nwin and losses per month this has always been surprising that my win/loss per month has been always approx 50%, still it doesn\u0026rsquo;t mean a player is not improving, for example plot above shows win/loss progression has been changing, and game itself is adapting which is giving me tougher games and slowly making me grow rather than throwing me in some random unfair game continuously.\ndota 2 gaming activity for past 3 years recent activities for past few weeks, i have been playing a bit less, since i started playing deadlock\nplaytime distribution one of the best visualization which i did, personally i dislike playing in the morning even if i have time, but you will notice few played time, which is there because i did play few during the weekend.\nsleep/dinner time as per my samsung sleep time statistics, my average bedtime is 04:55, hence a couple of games played even at around 04:00. another interesting take, if you notice, 21:00 games played is lower compared to 20:00 and 22:00, reason being i eat my dinner at that time.\nplaying patterns throught the week this is an interesting plot, which shows my playtime across the week/hours, where if you notice, on weekend, i have been very aggressive towards playing early or playing late until 05:00\nheroes played throught my dota journey as mentioned in the introduction, rubick has been my favourite hero, and that was my hero which i have been aiming to reach grandmaster title (title to prove my love for that hero); hence you might enjoy looking at this visualization of which heroes i have been playing for past few years.\nview it in fullscreen for better experience\nheroes selection as mentioned, dota 2 has around 125 heroes, out of which i liked playing support heroes a lot, but it changed from 2023 where i started experimenting with a couple of heroes and enjoyed offlane role as well, hence slow growth for top 10 heroes except rubick, and new heroes coming in left side.\nyour browser does not support the video tag. heroes played by their attributes some interesting observation here, i hate playing agility heroes, and enjoy playing intelligence which are mostly supports and strength heroes which are offlaners.\n2024 compendium quest one of the best things i like about dota is the events which they bring out, i do feel bad thinking about some of the great or insane events which i might have missed before 2022, and looking at them now, i feel like i missed a lot there. csgo on its own got maybe 1–2 operations in my 4 years journey but dota 2, holy they have been playing a totally different game, just look at their prize pool for 2019-2021.\nthe compendium is like an interactive digital battle pass or seasonal passport in dota 2 that turns watching and playing the game into an engaging collection experience. think of it as a combination of a fantasy sports league, a digital collectible album, and a progression system.\nthe end goal is to reach 300lvl which in turn gives you a miniature size trophy of \u0026ldquo;the international\u0026rdquo; which is like a major tournament which happens once every year. to achieve 300lvl, one has to complete some objectives to get lvl:\nplay and win one dota game every day for 8 weeks continuously play fantasy game and if you are under 99 percentiles predict tournament winners achieve patterns in bingo cards result bingo in bingo, total i missed 3 cards, which is actually good as i lost only 6 levels.\nbingo card 1 bingo card 2 bingo card 3 fantasy in fantasy, i reached maximum 99% rewards for two parts, in the second part, somehow i messed up the key attributes assigned to players.\nfantasy 1: 99.01% fantasy 2: 96.90% fantasy 3: 99.15% oracle: win prediction win prediction was actually tough for me, surprisingly prediction for team liquid has been so on point, i did hope of tundra winning, but that didn\u0026rsquo;t happen hence, i lost few of the points there.\ndecider lower bracket 1 lower bracket 2 finals point logs as you might have seen, i have been doing data-driven decision, and collecting data has been a challenge, when it comes to point logs for my compendium 2024, i didn\u0026rsquo;t know how will i be able to export this data into a table?\ni tried using vconsole2 which shows network calls in dota 2 coordinators, but it didn\u0026rsquo;t show any raw message, i tried using wireshark as well to intercept the message, but then binary data is difficult to decrypt. based off my knowledge, i didn\u0026rsquo;t know whether it is possible or not; hence i decided to go the tough way, which is letting computer vision do its job manually.\nhence, i recorded a points log video which scrolls from bottom to top and used each frame and fed it to python opencv code to detect unique entries and created a csv file.\nframe from video grayscale thresholding full source code: link\nvideo link: link\nand finally, the parsed csv file: link (2 week of points missing)\nconclusion and finally i got it today at 7th april, 2025\noverall, i spent initial money to purchase compendium 2024, and 1 extra set of levels to complete my 300 levels, and as of now i am really looking forward to aegis 2024, hence looking forward to posting an image here :)\ni regret buying extra level, as i didn\u0026rsquo;t knew if week 8 will still be in progress hence as of now my final level is 343.\nbonus hackerman moment\n03 oct 0.01h 07 oct 0.02h 10 oct 0.05h 14 oct 0.01h 21 oct 1.71h 22 oct 2.99h 23 oct 1.97h 24 oct 3.73h 25 oct 1.21h in total i spent approx 11.7 hours or more to write this blog post with 9 sessions.\nthank you for reading till the end :) ","date":"2024-10-25","permalink":"https://shashanksharma.xyz/posts/dota-compendium-2024-quest/","summary":"Introduction I have been playing games for a while, and it has been usually FPS games, where it started from CS1.6 to CSGO for more than 3,000 hours and playing Valorant for 2 more years, and a couple of other ones, and at some point I realized I need to try some new style and started playing dota 2.\nMy first-ever game of Dota 2 was at around May 2022, where I had no idea what I was doing and why I was doing it, those 5 games I remember closing them in between and had no idea what to build by looking at the items in the shop.","title":"a data-driven look at my dota 2 compendium quest"},] [{"content":"","date":"2024-10-25","permalink":"https://shashanksharma.xyz/notebooks/blog-time-spent/","summary":"","title":"total time spent while working on something"},] [{"content":"","date":"2024-10-24","permalink":"https://shashanksharma.xyz/notebooks/dota2-player-analysis/","summary":"","title":"dota 2 player analysis"},] [{"content":" introduction one day, i had a thought, why is it that other organization can track all my stats and not share those raw data with me, even though i am the one who is generating it, obviously few of the services do allow sharing some insights like netflix watch history or google activities (limited), but it has its own issues like they are not detailed, or some histories are not persistent.\nfor me, it all started with my obsession with generating and tracking data, and also in school days when i used to track my day-to-day activities manually whenever i unlocked my phone, but obviously such manual tracking is far too irritating. now tracking my application usage can be either tracking my day-to-day activities, or all the processes running in my devices or tracking what\u0026rsquo;s running as an active application (application which i am consuming via display) and hence i choose the latter, which is tracking active application and requires minimal intervention from my end.\nrecently, i started using software to track my application usage on my laptop to generate raw data which later on i can use for analysis. i started this exercise when i joined hackerrank and started using an open-source software called \u0026ldquo;tockler\u0026rdquo;. it\u0026rsquo;s been over 2 years using tockler, and have been going great\nnow it\u0026rsquo;s been a few months, i have been very much into self-hosting and one of the features which i was looking for is, how to view all my device status if they are online or not, and if let\u0026rsquo;s say i had to write a client-side service for it, how will it work, and how the logic works like? and that is when i had a thought, what if i have to show my daily activity on my website, which let people know whether i am online or not, and if yes, then what exactly i am doing. this is very much similar to discord activity status\nchallenges with existing tracking software one thing which i am sure of was, i need a backend which may or may not store the activity and an api to fetch the details of it. for that to happen, there is a need to consume the data which is getting generated by these software packages, is it either interacting with their apis if they have, or directly access their storage, or make modifications to the software itself to publish it to some external storage.\nthat is when i started understanding how tockler works, since i have been using it for 2 years, and started looking if it is possible to publish the activity. one thing which i understood: there is no programmatical way of fetching data easily data is getting stored in sqllite3 once there have been a feature of pushing data into firestore, but it\u0026rsquo;s been deprecated after looking at these facts, i went ahead and decided, what if i can make publish the events every time an activity changes, which is tracking anytime it makes changes to sqlite3, it pushes to some other service. hence, i add an eventlistner to understand this\n+ appmanager.knexinstance.on(\u0026#39;query\u0026#39;, (querydata) =\u0026gt; { + logger.info(`executing query: ${querydata.sql}, ${querydata.bindings}`); + }); even though it worked fine, it has its own disadvantage:\ni have to write custom parser, which understand these events and updates in my backend service i will have to introduce changes to the forked repository and maintain it if an event fails to sync with upstream service due to either internet issue or any api failures, i need a backup strategy, by maintaining state or having some queueing mechanism and looking at these facts, it all makes it more complicated, and it makes me realize that i need a better alternative to tockler which supports the same feature but better flexibility to overcome my requirements.\nnew tracking software while looking out for new software which helps to track application data activity, one big constraint i had was \u0026ldquo;portability\u0026rdquo;, it should be able to run on windows, macos and on android (optional). and after a couple of research, i did found activitywatch\nactivitywatch design somewhat looks like microservice architecture, it has clients which run at localhost:3000, a server which has api endpoints written in rust (migrated from python) and a set of watchers which track activity\nthe way it stores data is easy to consume: https://notes.shashank-sharma.xyz/services/activity-watcher/schema\nsince the server runs at client itself, the apis can be accessed from localhost:300 and if once i looked at rust code, it indicates that the events can be fetched, and it has filters as well.\nand one of the things which i was able to conclude was, i need not make any changes to the software itself, and it gives me enough apis to use it via external service which i can create.\napi endpoints for activitywatch: https://github.com/activitywatch/aw-server/blob/master/aw_server/rest.py\ncreating backend during my whole experience, i have been majorly working in python ecosystem itself, which means, i have worked with flask, django/drf and fastapi, and recently, i switched to golang when i joined hackerrank, given i have been majorly working in that language itself, which i decided to use so that i can learn more. at the same time, i don\u0026rsquo;t want to build things from scratch like auth or db setup or logging and more, and while looking at a couple of frameworks, i liked pocketbase given it is open-source, and it has all the requirements built in.\none thing to note, pocketbase originally was not meant to be used as a framework, but still, i decided to extend its features based off documentation and decided to write custom apis, as per my requirements. as of my next step, i had to understand how data is stored in activitywatch, reading its schema in sqlite3 and deciding what i need to store in my backend, which i have mentioned it here: link\nonce finalized, another crucial decision to handle is, how to send an online event to the service saying whether the device is online or not and what exactly is running. the first simple approach is, have an api endpoint for post, and send necessary details if required or simply ping it. another viable approach can be figuring out new protocols and using their features, for example, mqtt was something which i liked but for the backend which i chose, it didn\u0026rsquo;t have features implemented, or may require some changes; hence i decided to go with rest api approach\ni decided to create a table which can support both tockler backfill + activitywatch events sent via api, once done, i created custom api endpoints for easy communication, and added necessary logic for handling edge cases, duplicacy and consistency over which records need to be created with necessary fields present. i spent a good amount of time structuring my code for easy maintenance and better readability, and can be seen here\nat last, i needed a better way to send events in an authenticated manner, not too strict with security but at least allowing only my events to be pushed without worrying about auth token expiry or refresh token. for that, i created one static dev token, which can be used for each api calls specific to update tracking items and allows you to update the record for that particular user itself.\ncreating client once backend is ready, now i want a client to be able to fetch necessary records from the activitywatch apis and push it to the server, one of the requirements here is, that it should be desktop application, it should be portable enough to run in windows/mac/linux/android and should be able to run in background if closed. this time, i had a bias of using golang for everything and picked fyne as my application building framework.\ni started to play around with it, ran it on every device, and it\u0026rsquo;s been working great. one of the downsides which i felt is that it is not optimized enough for resource usage. just a hello world application bundled with 70+ mb artifact for all the applications, ram usage is also slightly high relative to its job, which is nothing but making a post request every x seconds, but this is something which anyone can live with it.\nat last, i spent writing a poc, tried it in each platform, and it worked perfectly, one downside with android is that it\u0026rsquo;s not guaranteed that it will run as a background process based off its limitations and how android handles processes, for example battery optimization feature might detect my app is taking significant amount of battery and may pause these processes. but obviously these can be overcome by changing a few settings, giving high priority to these apps etc, but still, the android implementation is still in experimental phase when it comes to fyne.\nonce finalized, i spent a good amount of time, understanding how to work with fyne, how to structure the code and stuff; hence i decided to create a better layout for handling routes/pages, cron jobs, clients like backend/activitywatch apis and more. i am missing out a couple of things, which are:\nability to push all events which have been tracked by activitywatch from day 1 maintaining state of what all events have been processed figuring out all the buckets available in activitywatch and syncing user preferred events only detecting ongoing application usage and resending them so that it can be updated and rest common stuff like auth check, logging, cron initialization, closing all cron if app is closed etc. again, all this code can be found here\nand at last, everything worked great, and i have been using this app for more than 3 months.\nfinal result i did give thought of adding a feature to spoof data which can be seen in my website because showing my accurate application usage may bring unnecessary complications, but at the same time, spoofing data will ultimately lead to cheating my visitors who are coming to my website and seeing the status. hence, i decided to keep showing my application visibility as it is, and if in future is i face any troubles with it, then i\u0026rsquo;ll simply hide the application name and start showing the online status only.\nconclusion doing all this does give you satisfaction of seeing all your activity in one place, knowing that you peek into the past and see what you did in the past x years. having data in one place expands a lot more use cases on what you can do moving forward, for example:\nwhat are my working time vs personal time how distracted i am between a given timeframe wrt different devices usage which application i have been using a lot lately, is it zoom for meetings, or slack for async communication and the one which i am actually interested is in llm understanding or finding any anomaly which is difficult to find just by a sql query/filters. bonus i have been able to create mountain graph which you can see at bottom because of these data itself, i have python notebook for fetching and generating the data, which can be found here\n05 aug 0.81h 06 aug 0.25h 07 aug 3.85h 08 aug 4.42h 09 aug 1.13h 10 aug 3.89h 11 aug 4.19h 12 aug 0.85h 13 aug 0.05h 14 aug 3.04h 15 aug 1.53h 16 aug 1.41h 17 aug 3.79h 18 aug 1.55h in total i spent approx 30.8 hours or more to write this blog post with 14 sessions.\nthank you for reading till the end :) ","date":"2024-08-20","permalink":"https://shashanksharma.xyz/posts/building-live-device-feed/","summary":"Introduction One day, I had a thought, why is it that other organization can track all my stats and not share those raw data with me, even though I am the one who is generating it, obviously few of the services do allow sharing some insights like Netflix watch history or google activities (limited), but it has its own issues like they are not detailed, or some histories are not persistent.\nFor me, It all started with my obsession with generating and tracking data, and also in school days when I used to track my day-to-day activities manually whenever I unlocked my phone, but obviously such manual tracking is far too irritating.","title":"building live device feed for my website"},] [{"content":"preface the main reason behind this blog is to understand the seriousness of mistakes that were made without any proper arrangements and the impact of such mistakes. in this blog, i\u0026rsquo;ll be covering how the rgpv exam portal was so vulnerable that it exposed around 49 thousands of students\u0026rsquo; private data/submissions publicly and how it could have ruined everyone\u0026rsquo;s exam without any authentication.\nimpact of given vulnerabilities include:\n49,000 students of rgpv student data leak which includes all possible pii data (like: phone number, email, etc) question paper leak submit exam for any other student unlimited image upload to their server lfi exploit which allows you to see file content from their server view all student submission like question answered from final exams which held during 24th aug - 27th aug 2020 introduction after covid-19 pandemic started, rgpv university announced that for final year students, they are planning to organize an exam which will consist of mcq questions (total 40 questions) and it will be an open book exam as mentioned here. this process started by taking two mock exams to test if things are good or not, and then having the final paper from the 24th of august till the 1st of september 2020.\nnow with given information, rgpv decided to create one exam portal for us, which was handled by eduvita. they started registration and creation of portal at rgpvexam.in. how the overall exam process was figured out is by having one unique url for every student assigned, which allows for that particular student to give an exam.\nstep towards testing exam portal during the first day of the exam, there were many issues where it was returning: network error. the full story can be found here which shows that there were technical difficulties and i also faced this issue.\nnow being a computer science engineer, i was already curious about this scenario, and as of my first step, i started with the debugger tool and investigated what was happening. at first, i encountered one debugger state which they initialized in their index.html file, which avoids people to debugging their javascript code. if you are curious, read this here.\non bypassing this, i observed my network tab and found that there was a 504 error returned, as shown in the image.\nnow as of the next step, i was more interested in knowing if their website is vulnerable or not? this curiosity leads me to use the nmap tool, i started one vulnerability check script, and after 5 min, there were two exploits which i found:\nhttp-phpmyadmin-dir-traversal: php file inclusion vulnerability which allows remote attackers to include local files via the redirect parameter, possibly involving the subform array. http-vuln-cve2011-3192: vulnerable to a denial of service attack when numerous overlapping byte ranges are requested. the second vulnerability was more towards ddos attack, but the first vulnerability was shocking, as it was an lfi exploit (local file inclusion). to confirm the hypothesis, i tested it by traversing random directory, and after 8 - 10 attempts, i was able to test it by:\n\u0026lt;base_url\u0026gt;/index.php/?p=../../../\u0026lt;any_file_name\u0026gt;\nimpact: allows anyone to access local files directory, which includes any file like environment keys, logs, or any sensitive data.\nunderstanding tech stack before moving ahead, i started by exploring their tech-stack, and with a little bit of exploration, i found out that they are using:\nfrontend - angular js backend - express web server - nginx database - mongodb with given knowledge, i went ahead observing their api structure and started messing around with it. at first, i send one random post request, and on the response i got:\nthis was my first red flag that things are not right because, in production, it\u0026rsquo;s not a good practice to show error stack. later on, the webserver went down so i had to call it a day.\nthe next day, i repeated the vulnerability test, and i found that lfi exploit is no longer present, which is an improvement. (lfi was there till 24th of august 2020).\non my coming paper 26th august 2020, i was prepared for further testing, i started capturing all the networks request happening and noted down all the endpoints and their content. so for a typical exam in rgpv exam portal, things were like:\ncheck otp: \u0026lt;base_url\u0026gt;/common/student/urlcheck check dob/fathers name: \u0026lt;base_url\u0026gt;/common/student/checkdob upload photograph: \u0026lt;base_url\u0026gt;/common/student/checkexamconfig confirm profile: \u0026lt;base_url\u0026gt;/common/student/confirmprofile waiting phase: \u0026lt;base_url\u0026gt;/common/student/checkexamconfig exam started: \u0026lt;base_url\u0026gt;/common/paper/\u0026lt;exam_code\u0026gt; exam submission: \u0026lt;base_url\u0026gt;/common/paper/\u0026lt;exam_code\u0026gt; problem with all these steps is that except step 1 and 3, each post request to an endpoint needs only enrollment_no as their body to get a response which is worrying because ideally there has to be jwt token used in a header which acts as an authentication.\nimpact given you have any person enrollment_no which exist in their database, anyone can:\n1. question paper leak get full question paper in json format as a response. it includes all question list and answer list.\n2. create/update/delete exam submission as explained above, with the enrollment number, anyone can overwrite the submission of another student easily without any authentication.\n3. upload image without auth as of the third step, which is uploading your photo through webcam. from the client-side, it captures an image and requests to backend endpoint with only file content without any authentication. so, in reality, anyone can upload any image to the given endpoint and spam given aws s3 server with their cat photos.\n4. minor data leak as of the fourth step, it is just a confirmation page which takes enrollment_no and responds with:\nname ip address institute name reverse engineering frontend understanding compiled angular js code is quite painful, but i was pretty much interested in all the endpoints registered in the code. as of the first step, i saved rgpvexam.in page and checked out their js code by making it prettify and finding \u0026ldquo;http.post\u0026rdquo; in code. once i understood their code i found all possible endpoint registered over there:\ncheckurl(t, e = {}, n = 1, i = 10) { return this.http.post(`${this.baseurl}/${t}/urlcheck`, { query: e }) } checkdob(t, e = {}, n = 1, i = 10) { return this.http.post(`${this.baseurl}/${t}/checkdob`, { query: e }) } checkexamconfig(t, e = {}, n = 1, i = 10) { return this.http.post(`${this.baseurl}/${t}/checkexamconfig`, { query: e }) } confirmprofile(t, e = {}, n = 1, i = 10) { return this.http.post(`${this.baseurl}/${t}/confirmprofile`, { query: e }) } getresult(t, e = {}, n = 1, i = 10) { return this.http.post(`${this.baseurl}/${t}/getresult`, { query: e }) } getquestions(t, e) { return this.http.post(`${this.baseurl}/paper/${t}`, e) } updateresult(t, e) { return this.http.post(`${this.baseurl}/paper/updateresult/${t}`, { query: e }) } updateseen(t, e) { return this.http.post(`${this.baseurl}/paper/updateseen/${t}`, { query: e }) } updateanswered(t, e) { return this.http.post(`${this.baseurl}/paper/updateanswered/${t}`, { query: e }) } getdata(t, e = {}, n = 1, i = 10) { return this.http.post(`${this.baseurl}/${t}/urlcheck`, { query: e }) } searchdata(t, e = {}, n = 1, i = 10) { return this.http.post(`${this.baseurl}/${t}/urlcheck`, { query: e }) } createdata(t, e) { return this.http.post(`${this.baseurl}/${t}/urlcheck`, { doc: e }) } updatedata(t, e, n) { return this.http.post(`${this.baseurl}/${t}/urlcheck`, { doc: n }) } deletedata(t, e) { return this.http.post(`${this.baseurl}/${t}/urlcheck`, { id: e }) } maillink(t) { return this.http.post(`${this.baseurl}/${t}/urlcheck`, { enrollment_no: t }) } mailotp(t) { return this.http.post(`${this.baseurl}/${t}/urlcheck`, { enrollment_no: t }) } checkotp(t, e) { return this.http.post(`${this.baseurl}/${t}/urlcheck`, { enrollment_no: t }) } uploadfile(t) { const e = new formdata; return e.append(\u0026#34;file\u0026#34;, t, t.name), this.http.post(dc.url + \u0026#34;/api/file/uploadstudentimage\u0026#34;, e, { reportprogress: !0, observe: \u0026#34;events\u0026#34; }) } uploadface(t, e) { const n = new formdata; return n.append(\u0026#34;file\u0026#34;, e, e.name), this.http.post(dc.url + \u0026#34;/api/file/uploadface/\u0026#34; + t, n, { reportprogress: !0, observe: \u0026#34;events\u0026#34; }) } this confirms that all endpoints were using enrollment_no as their identifier without any authentication.\nbackend once i knew about all the endpoints, it was time to test other endpoints as well which were not mentioned in the frontend code. this is where i started experimenting with random endpoints because there has to be something out there.\none thing which is repetitive in each endpoint was:\n\u0026lt;base_url\u0026gt;/common/\u0026lt;something\u0026gt;\nwhere something is either student or paper, but for testing, i tried some random string like \u0026ldquo;table\u0026rdquo; and the response was a bit shocking as shown in the image.\nhere, \u0026ldquo;table not found\u0026rdquo; means that anything after /common/ will count as table name and it will return all data present for that table.\ni started with multiple strings like \u0026ldquo;exam\u0026rdquo;, \u0026ldquo;user\u0026rdquo; or anything sensitive and then i found multiple table name which was present.\nto show how serious is this, anyone with bad intention might have written a script with common table names and might have spammed a given endpoint to extract every single data from a given table.\nimpact: given vulnerability existed till 28th of august 2020 which includes mock tests and two final exams for all possible branch\n1. /common/student reveals 49 thousand student information present in the database who registered for the rgpv exam portal and this includes every pii which rgpv has (like dob, ip, phone_no, email, etc) including unique id, otp which was meant to be safely stored by each student and much more.\nin general, the privacy of every student was compromised and no one knows how many of them extracted all the given data and might have sold this or using it for marketing purposes.\n2. /common/result reveals every student submission, this submission includes all answers given for the given question and at which time, which question was seen, and more.\nin short, anyone will know which student gave which exam with how much correctness and how much time was taken.\n3. /common/institute all institute with their id and name present in db. this was not much helpful in revealing sensitive data but not good to share.\nconclusion this whole process of organizing the final exam is rushed heavily because of which i found multiple vulnerabilities. i found lfi and data breach in every possible way, our privacy of around 49 thousand students\u0026rsquo; data was compromised by the exam portal. problem is that the given mistake is not reversible, the damage is already done but thanks to rgpv that they realized this later on and fixed this problem.\nwhat i did from my end? as soon as i found out about these exploits, i approached almost all possible contacts:\nrgpv - no reply from \u0026ldquo;rgpvexam2020@rgtu.net\u0026rdquo; nciipc - got reply + sent report aicte - no reply for minor questions, i approached via telegram but no response over there as well.\nwhat\u0026rsquo;s next? as of 28th of august 2020, we were notified in telegram channel that:\nin short, some security updates were made but what exactly was it is still unknown in the given message but i believe they should have at least mentioned as if what sort of mess they created to start with. but there are still a few things which need improvement.\nfrom my end, once i saw the message, i quickly saw all api endpoints again, and finally, those exploits are gone. now getting student\u0026rsquo;s data or any table data is not possible so that is fixed. the whole process of examination is introduced with the jwt token. so once the user enters with a unique url and confirms dob and father\u0026rsquo;s name, then one jwt token is returned and used in the future endpoint. so this is great news for everyone.\nbut now the real question remains, is this the solution? technically yes, but what about the damage which already happened, still data is already leaked, because otp of each user is already leaked so we all are still exposed.\nhaving no authentication mechanism to start with and introducing it, later on, is an improvement, but why are we compromising this in such a larger audience platform? and who will be responsible for this? now this is something which is still unknown\n","date":"2024-06-30","permalink":"https://shashanksharma.xyz/posts/how-vulnerable-was-rgpv-exam-2020/","summary":"Preface The main reason behind this blog is to understand the seriousness of mistakes that were made without any proper arrangements and the impact of such mistakes. In this blog, I\u0026rsquo;ll be covering how the RGPV Exam portal was so vulnerable that it exposed around 49 thousands of students\u0026rsquo; private data/submissions publicly and how it could have ruined everyone\u0026rsquo;s exam without any authentication.\nImpact of given vulnerabilities include:\n49,000 students of RGPV student data leak which includes all possible PII data (like: phone number, email, etc) Question paper leak Submit exam for any other student Unlimited image upload to their server LFI Exploit which allows you to see file content from their server View all student submission like question answered from final exams which held during 24th Aug - 27th Aug 2020 Introduction After Covid-19 pandemic started, RGPV University announced that for final year students, they are planning to organize an exam which will consist of MCQ questions (total 40 questions) and it will be an open book exam as mentioned here.","title":"how vulnerable was rgpv exam 2020"},] [{"content":"📑 pages python notebooks link microblog link 🌞 personal link1 link2 link3 link4 link5 link6 link7 link8 link9 🔨 tools link1 link2 link3 link4 link5 link6 link7 link8 link9 🍺 blog link1 link2 link3 link4 link5 link6 link7 link8 link9 📑 documentation bookmark item one https://bookmark-item-one.com bookmark item two https://bookmark-item-two.com bookmark item three https://bookmark-item-three.com ","date":"2023-05-26","permalink":"https://shashanksharma.xyz/nav/","summary":"📑 Pages Python notebooks Link Microblog Link 🌞 Personal link1 link2 link3 link4 link5 link6 link7 link8 link9 🔨 Tools link1 link2 link3 link4 link5 link6 link7 link8 link9 🍺 Blog link1 link2 link3 link4 link5 link6 link7 link8 link9 📑 Documentation bookmark item one https://bookmark-item-one.com bookmark item two https://bookmark-item-two.com bookmark item three https://bookmark-item-three.com ","title":"navigation"},] [{"content":" introduction python is really a powerful language and with proper use of it anyone can make beautiful things. after studying python i was really impressed by its power and to be more specific i really love how we can scrape any website easily with the help of python. scraping is a process of extracting data from website by their html data. so i learned its basic and started scraping many website.\nrecently i thought of creating something big through scraping but i was having no idea what to do. then i came across with the site of mp transportation and i realized that they got so many data inside there website. the website is very simple, you open the site enter your transport number details and then search it. then you will get result about your transport vehicle which includes type, color etc.\nwith python2.7 i created one script to scrape because with python 3.x there were less support to some modules. i decided to go for \u0026rsquo;last\u0026rsquo; search type because with others i was facing some issues (may be site problem). for this i will have to search each input from 0000 - 9999 in short it makes around 10000 requests. we took 4 digits because it requires min 4 characters to enter. so yeah it was this large.\ni created one program and started scrapping but then with 0000 input and \u0026rsquo;last\u0026rsquo; type search i found that it scraped successfully and i got 1700+ data. but the problem was that it took 5 minutes to scrape 1 request. this happened because of server delay. it was not my problem but it was server\u0026rsquo;s problem to search this much data from database. after realizing this i did some maths.\nif 1 request take = 5 minutes, then, 10000 requests = 50000 minutes = 833.33 hours = 35 days approx = 1 month 4 days\nso in short i need my laptop to run for 1 month and 4 days to run continuously and trust me it\u0026rsquo;s really a bad idea to do so. but is it worth doing it ?\nif 1 request is giving approx 1000 data 10000 requests = 10,000,000\nso yeah, hypothetically in 35 days i will be able to achieve 10 millions of data. but still being a programmer we must do stuff as fast as possible and to achieve this one thing is sure that i need some power, memory, security etc. i tried multiprocessing and multi threading but it was not working as expected\nso the solution for this problem was getting your hand on some free servers. so i started searching some free website host company which supports python and thought of deploying my script over there. i tried this in pythonanywhere.com and in heroku with the help of flask framework but there was no success. i waited almost 15 days to decide what to do. later i found one site scrapinghub.com which lets you deploy spider on cloud and rest they will take care of that so i went for it and started learning it.\nafter that i learned how to use scrapy and scrapinghub and i created another new program to scrape website with the help of scrapy spiders. source code for this is at the end of this page\nexperiment day 1 - 4,092,328 (4 millions of data in 17 hours) id1 - items - 1,134,421 (15 hours)\nid2 - items - 1,025,282 (17 hours)\nid3 - items - 983,367 (14 hours)\nid4 - items - 949,228 (13 hours)\nsize - 1.3 gb day 2 - 6,498,462 (6.4 millions of data in 17 hours)\n(created 2 more id\u0026rsquo;s to boost my process)\nid1 - items - 1,241,643 (17 hours)\nid2 - items - 1000308 (15 hours)\nid3 - items - 962863 (15 hours)\nid4 - items - 1052844 (15 hours)\nid5 - items - 1144686 (16 hours)\nid6 - items - 1096118 (15 hours)\nsize - 2.4 gb\nfinal result total data collected: 10,590,790 total size: 3.7 gb\ntime consumed: 34 hours\nin just 34 hours by scraping we collected 10 millions of data which was estimated earlier. if we tried to do this process in old fashion like in laptop then it would have taken 1 month so we optimized it.\ndata analysis the main question arises is what to do with data ? which tools to use while analyzing. since the size of our json files are huge. if we will be able to convert json file to database file then it would be really great but doing this will again require loads of time.\nfrom json to database\nwe can do 5 data per second,\nfor 10,000,000 = 2,000,000 seconds = 33333 minutes = 555 hours = 23 days.\nnow that thing is not possible.\ni tried even doing it through sql script which is much better as compare to the previous script but still it will also take approx 20 days.\nso we will use these data in json format, load it into python script and then do our maths over there. loading one file may take approx 10 minutes but time is not an issue. the problem is that loading json file in python takes so much of memory. i mean a lot and since we are working on normal laptop then we need to think of something else. to avoid such problem i used ijson module in python. its really a handy tool which iterates over json data rather than loading it all of sudden. but again with this power we need to sacrifice time a little but still its worth it.\nstats in which state maximum transport is there ?\nindore - 1625663 bhopal - 1023054 jabalpur - 589875 gwalior - 477625 ujjain - 371559 sagar - 272974 chhindwara - 268971 ratlam - 258581 rewa - 242377 dewas - 240930 link:\u0026nbsp;https://plot.ly/~shashank-sharma/19/\nwhich color does people prefer while buying any transport vehicle ?\nblack - 2137200 red - 683663 not specify - 560975 blue - 341134 grey - 288952 white - 283631 silver - 255836 rbk - 238896 p black - 177379 pbk - 168518 link:\u0026nbsp;https://plot.ly/~shashank-sharma/11/\nwhich company have its maximum vehicle in mp ?\nhero honda motors - 2032369 bajaj auto ltd - 1677867 hero moto corp ltd. - 1563023 tvs motor co. ltd. - 1130974 honda mcy \u0026amp; scooter p i ltd - 1102624 mahindra \u0026amp; mahindra ltd - 463175 tata motors ltd - 280684 maruti suzuki india limited - 258392 maruti udyog ltd - 249949 escorts ltd - 139231 link:\u0026nbsp;https://plot.ly/~shashank-sharma/13/\nin which year does maximum vehicle were issued ?\n2016 - 1406802 2014 - 1392520 2015 - 1166079 2013 - 964026 2011 - 845374 2012 - 734092 2010 - 716772 2009 - 607693 2008 - 481315 2007 - 471963 link:\u0026nbsp;https://plot.ly/~shashank-sharma/15/\nwhich transport vehicle is in majority ?\nsplendor plus - 325878 platina - 302537 hf deluxe self cast wheel - 254166 activa (ele auto \u0026amp; kick start) - 216252 tvs star city - 210397 cd dlx - 188885 discover dts - si - 180193 passion pro(drm-slf castwheel) - 163088 activa 3g eas ks cbs bs3 - 162542 passion plus - 146584 link:\u0026nbsp;https://plot.ly/~shashank-sharma/17/\nwhat type of vehicle does people have in majority ?\nmotor cycle - 6531708 scooter - 1291932 motor car - 881930 tractor - 687360 goods truck - 210932 moped - 197450 omni bus for private use - 142478 auto rickshaw passenger - 124051 trolly - 111358 pick up van - 95238 link:\u0026nbsp;https://plot.ly/~shashank-sharma/9/\nand that's how many more questions can be solved with the given data.\nthank you for reading till the end of this page. i hope by now you realized the real power of python.\nsource code:\u0026nbsp;https://github.com/shashank-sharma/mp-transportation-analysis\n","date":"2017-04-16","permalink":"https://shashanksharma.xyz/posts/india-mp-transportation-analysis/","summary":"Introduction Python is really a powerful language and with proper use of it anyone can make beautiful things. After studying Python I was really impressed by its power and to be more specific I really love how we can scrape any website easily with the help of python. Scraping is a process of extracting data from website by their html data. So I learned its basic and started scraping many website.\nRecently I thought of creating something big through scraping but I was having no idea what to do.","title":"india's mp transportation analysis through python"},] [{"content":"what i do? software engineer at coursera got something to discuss? drop me a mail at: shashank.sharma98@gmail.com","date":"0001-01-01","permalink":"https://shashanksharma.xyz/about/","summary":"What I do? Software Engineer at Coursera Got something to discuss? Drop me a mail at: shashank.","title":"about"},]