[{"content":" introduction i have been playing games for a while, and it has been usually fps games, where it started from cs1.6 to csgo for more than 3,000 hours and playing valorant for 2 more years, and a couple of other ones, and at some point i realized i need to try some new style and started playing dota 2.\nmy first-ever game of dota 2 was at around may 2022, where i had no idea what i was doing and why i was doing it, those 5 games i remember closing them in between and had no idea what to build by looking at the items in the shop. for context, dota has around 125 heroes and understanding all the heroes takes significant amount of time, and even on top of it when i saw rubick hero, which steals other hero\u0026rsquo;s spells, i was so impressed by the game design and i devoted a major amount of time understanding and tried being good at it.\nthe journey through time source code all the visualization/animation is done via python, you can have a look at notebook with all the code present: [dota2-player-analysis.ipynb](/notebooks/dota2-player-analysis)\noverall winrate progression if you notice, initially, for a few 100 games, i struggled understanding the game because of which my net wins were in negative. but slowly, i started winning more than losses because of which win rate did increase and got almost linear thereafter.\nyour browser does not support the video tag. patch behaviour also, another quick observation, at some patches (blue lines), it requires players to complete objectives which makes them play into an uncomfortable position which leads to losing streak. for example, playing those heroes which they are not comfortable\nwin and losses per month this has always been surprising that my win/loss per month has been always approx 50%, still it doesn\u0026rsquo;t mean a player is not improving, for example plot above shows win/loss progression has been changing, and game itself is adapting which is giving me tougher games and slowly making me grow rather than throwing me in some random unfair game continuously.\ndota 2 gaming activity for past 3 years recent activities for past few weeks, i have been playing a bit less, since i started playing deadlock\nplaytime distribution one of the best visualization which i did, personally i dislike playing in the morning even if i have time, but you will notice few played time, which is there because i did play few during the weekend.\nsleep/dinner time as per my samsung sleep time statistics, my average bedtime is 04:55, hence a couple of games played even at around 04:00. another interesting take, if you notice, 21:00 games played is lower compared to 20:00 and 22:00, reason being i eat my dinner at that time.\nplaying patterns throught the week this is an interesting plot, which shows my playtime across the week/hours, where if you notice, on weekend, i have been very aggressive towards playing early or playing late until 05:00\nheroes played throught my dota journey as mentioned in the introduction, rubick has been my favourite hero, and that was my hero which i have been aiming to reach grandmaster title (title to prove my love for that hero); hence you might enjoy looking at this visualization of which heroes i have been playing for past few years.\nview it in fullscreen for better experience\nheroes selection as mentioned, dota 2 has around 125 heroes, out of which i liked playing support heroes a lot, but it changed from 2023 where i started experimenting with a couple of heroes and enjoyed offlane role as well, hence slow growth for top 10 heroes except rubick, and new heroes coming in left side.\nyour browser does not support the video tag. heroes played by their attributes some interesting observation here, i hate playing agility heroes, and enjoy playing intelligence which are mostly supports and strength heroes which are offlaners.\n2024 compendium quest one of the best things i like about dota is the events which they bring out, i do feel bad thinking about some of the great or insane events which i might have missed before 2022, and looking at them now, i feel like i missed a lot there. csgo on its own got maybe 1β2 operations in my 4 years journey but dota 2, holy they have been playing a totally different game, just look at their prize pool for 2019-2021.\nthe compendium is like an interactive digital battle pass or seasonal passport in dota 2 that turns watching and playing the game into an engaging collection experience. think of it as a combination of a fantasy sports league, a digital collectible album, and a progression system.\nthe end goal is to reach 300lvl which in turn gives you a miniature size trophy of \u0026ldquo;the international\u0026rdquo; which is like a major tournament which happens once every year. to achieve 300lvl, one has to complete some objectives to get lvl:\nplay and win one dota game every day for 8 weeks continuously play fantasy game and if you are under 99 percentiles predict tournament winners achieve patterns in bingo cards result bingo in bingo, total i missed 3 cards, which is actually good as i lost only 6 levels.\nbingo card 1 bingo card 2 bingo card 3 fantasy in fantasy, i reached maximum 99% rewards for two parts, in the second part, somehow i messed up the key attributes assigned to players.\nfantasy 1: 99.01% fantasy 2: 96.90% fantasy 3: 99.15% oracle: win prediction win prediction was actually tough for me, surprisingly prediction for team liquid has been so on point, i did hope of tundra winning, but that didn\u0026rsquo;t happen hence, i lost few of the points there.\ndecider lower bracket 1 lower bracket 2 finals point logs as you might have seen, i have been doing data-driven decision, and collecting data has been a challenge, when it comes to point logs for my compendium 2024, i didn\u0026rsquo;t know how will i be able to export this data into a table?\ni tried using vconsole2 which shows network calls in dota 2 coordinators, but it didn\u0026rsquo;t show any raw message, i tried using wireshark as well to intercept the message, but then binary data is difficult to decrypt. based off my knowledge, i didn\u0026rsquo;t know whether it is possible or not; hence i decided to go the tough way, which is letting computer vision do its job manually.\nhence, i recorded a points log video which scrolls from bottom to top and used each frame and fed it to python opencv code to detect unique entries and created a csv file.\nframe from video grayscale thresholding full source code: link\nvideo link: link\nand finally, the parsed csv file: link (2 week of points missing)\nconclusion overall, i spent initial money to purchase compendium 2024, and 1 extra set of levels to complete my 300 levels, and as of now i am really looking forward to aegis 2024, hence looking forward to posting an image here :)\ni regret buying extra level, as i didn\u0026rsquo;t knew if week 8 will still be in progress hence as of now my final level is 343.\nbonus hackerman moment\n03 oct 0.01h 07 oct 0.02h 10 oct 0.05h 14 oct 0.01h 21 oct 1.71h 22 oct 2.99h 23 oct 1.97h 24 oct 3.73h 25 oct 1.21h in total i spent approx 11.7 hours or more to write this blog post with 9 sessions.\nthank you for reading till the end :) ","date":"2024-10-25","permalink":"https://shashanksharma.xyz/posts/dota-compendium-2024-quest/","summary":"Introduction I have been playing games for a while, and it has been usually FPS games, where it started from CS1.6 to CSGO for more than 3,000 hours and playing Valorant for 2 more years, and a couple of other ones, and at some point I realized I need to try some new style and started playing dota 2.\nMy first-ever game of Dota 2 was at around May 2022, where I had no idea what I was doing and why I was doing it, those 5 games I remember closing them in between and had no idea what to build by looking at the items in the shop.","title":"a data-driven look at my dota 2 compendium quest"},]
[{"content":"","date":"2024-10-25","permalink":"https://shashanksharma.xyz/notebooks/blog-time-spent/","summary":"","title":"total time spent while working on something"},]
[{"content":"","date":"2024-10-24","permalink":"https://shashanksharma.xyz/notebooks/dota2-player-analysis/","summary":"","title":"dota 2 player analysis"},]
[{"content":"introduction one day, i had a thought, why is it that other organization can track all my stats and not share those raw data with me, even though i am the one who is generating it, obviously few of the services do allow sharing some insights like netflix watch history or google activities (limited), but it has its own issues like they are not detailed, or some histories are not persistent.\nfor me, it all started with my obsession with generating and tracking data, and also in school days when i used to track my day-to-day activities manually whenever i unlocked my phone, but obviously such manual tracking is far too irritating. now tracking my application usage can be either tracking my day-to-day activities, or all the processes running in my devices or tracking what\u0026rsquo;s running as an active application (application which i am consuming via display) and hence i choose the latter, which is tracking active application and requires minimal intervention from my end.\nrecently, i started using software to track my application usage on my laptop to generate raw data which later on i can use for analysis. i started this exercise when i joined hackerrank and started using an open-source software called \u0026ldquo;tockler\u0026rdquo;. it\u0026rsquo;s been over 2 years using tockler, and have been going great\nnow it\u0026rsquo;s been a few months, i have been very much into self-hosting and one of the features which i was looking for is, how to view all my device status if they are online or not, and if let\u0026rsquo;s say i had to write a client-side service for it, how will it work, and how the logic works like? and that is when i had a thought, what if i have to show my daily activity on my website, which let people know whether i am online or not, and if yes, then what exactly i am doing. this is very much similar to discord activity status\nchallenges with existing tracking software one thing which i am sure of was, i need a backend which may or may not store the activity and an api to fetch the details of it. for that to happen, there is a need to consume the data which is getting generated by these software packages, is it either interacting with their apis if they have, or directly access their storage, or make modifications to the software itself to publish it to some external storage.\nthat is when i started understanding how tockler works, since i have been using it for 2 years, and started looking if it is possible to publish the activity. one thing which i understood: there is no programmatical way of fetching data easily data is getting stored in sqllite3 once there have been a feature of pushing data into firestore, but it\u0026rsquo;s been deprecated after looking at these facts, i went ahead and decided, what if i can make publish the events every time an activity changes, which is tracking anytime it makes changes to sqlite3, it pushes to some other service. hence, i add an eventlistner to understand this\n+ appmanager.knexinstance.on(\u0026#39;query\u0026#39;, (querydata) =\u0026gt; { + logger.info(`executing query: ${querydata.sql}, ${querydata.bindings}`); + }); even though it worked fine, it has its own disadvantage:\ni have to write custom parser, which understand these events and updates in my backend service i will have to introduce changes to the forked repository and maintain it if an event fails to sync with upstream service due to either internet issue or any api failures, i need a backup strategy, by maintaining state or having some queueing mechanism and looking at these facts, it all makes it more complicated, and it makes me realize that i need a better alternative to tockler which supports the same feature but better flexibility to overcome my requirements.\nnew tracking software while looking out for new software which helps to track application data activity, one big constraint i had was \u0026ldquo;portability\u0026rdquo;, it should be able to run on windows, macos and on android (optional). and after a couple of research, i did found activitywatch\nactivitywatch design somewhat looks like microservice architecture, it has clients which run at localhost:3000, a server which has api endpoints written in rust (migrated from python) and a set of watchers which track activity\nthe way it stores data is easy to consume: https://notes.shashank-sharma.xyz/services/activity-watcher/schema\nsince the server runs at client itself, the apis can be accessed from localhost:300 and if once i looked at rust code, it indicates that the events can be fetched, and it has filters as well.\nand one of the things which i was able to conclude was, i need not make any changes to the software itself, and it gives me enough apis to use it via external service which i can create.\napi endpoints for activitywatch: https://github.com/activitywatch/aw-server/blob/master/aw_server/rest.py\ncreating backend during my whole experience, i have been majorly working in python ecosystem itself, which means, i have worked with flask, django/drf and fastapi, and recently, i switched to golang when i joined hackerrank, given i have been majorly working in that language itself, which i decided to use so that i can learn more. at the same time, i don\u0026rsquo;t want to build things from scratch like auth or db setup or logging and more, and while looking at a couple of frameworks, i liked pocketbase given it is open-source, and it has all the requirements built in.\none thing to note, pocketbase originally was not meant to be used as a framework, but still, i decided to extend its features based off documentation and decided to write custom apis, as per my requirements. as of my next step, i had to understand how data is stored in activitywatch, reading its schema in sqlite3 and deciding what i need to store in my backend, which i have mentioned it here: link\nonce finalized, another crucial decision to handle is, how to send an online event to the service saying whether the device is online or not and what exactly is running. the first simple approach is, have an api endpoint for post, and send necessary details if required or simply ping it. another viable approach can be figuring out new protocols and using their features, for example, mqtt was something which i liked but for the backend which i chose, it didn\u0026rsquo;t have features implemented, or may require some changes; hence i decided to go with rest api approach\ni decided to create a table which can support both tockler backfill + activitywatch events sent via api, once done, i created custom api endpoints for easy communication, and added necessary logic for handling edge cases, duplicacy and consistency over which records need to be created with necessary fields present. i spent a good amount of time structuring my code for easy maintenance and better readability, and can be seen here\nat last, i needed a better way to send events in an authenticated manner, not too strict with security but at least allowing only my events to be pushed without worrying about auth token expiry or refresh token. for that, i created one static dev token, which can be used for each api calls specific to update tracking items and allows you to update the record for that particular user itself.\ncreating client once backend is ready, now i want a client to be able to fetch necessary records from the activitywatch apis and push it to the server, one of the requirements here is, that it should be desktop application, it should be portable enough to run in windows/mac/linux/android and should be able to run in background if closed. this time, i had a bias of using golang for everything and picked fyne as my application building framework.\ni started to play around with it, ran it on every device, and it\u0026rsquo;s been working great. one of the downsides which i felt is that it is not optimized enough for resource usage. just a hello world application bundled with 70+ mb artifact for all the applications, ram usage is also slightly high relative to its job, which is nothing but making a post request every x seconds, but this is something which anyone can live with it.\nat last, i spent writing a poc, tried it in each platform, and it worked perfectly, one downside with android is that it\u0026rsquo;s not guaranteed that it will run as a background process based off its limitations and how android handles processes, for example battery optimization feature might detect my app is taking significant amount of battery and may pause these processes. but obviously these can be overcome by changing a few settings, giving high priority to these apps etc, but still, the android implementation is still in experimental phase when it comes to fyne.\nonce finalized, i spent a good amount of time, understanding how to work with fyne, how to structure the code and stuff; hence i decided to create a better layout for handling routes/pages, cron jobs, clients like backend/activitywatch apis and more. i am missing out a couple of things, which are:\nability to push all events which have been tracked by activitywatch from day 1 maintaining state of what all events have been processed figuring out all the buckets available in activitywatch and syncing user preferred events only detecting ongoing application usage and resending them so that it can be updated and rest common stuff like auth check, logging, cron initialization, closing all cron if app is closed etc. again, all this code can be found here\nand at last, everything worked great, and i have been using this app for more than 3 months.\nfinal result i did give thought of adding a feature to spoof data which can be seen in my website because showing my accurate application usage may bring unnecessary complications, but at the same time, spoofing data will ultimately lead to cheating my visitors who are coming to my website and seeing the status. hence, i decided to keep showing my application visibility as it is, and if in future is i face any troubles with it, then i\u0026rsquo;ll simply hide the application name and start showing the online status only.\nconclusion doing all this does give you satisfaction of seeing all your activity in one place, knowing that you peek into the past and see what you did in the past x years. having data in one place expands a lot more use cases on what you can do moving forward, for example:\nwhat are my working time vs personal time how distracted i am between a given timeframe wrt different devices usage which application i have been using a lot lately, is it zoom for meetings, or slack for async communication and the one which i am actually interested is in llm understanding or finding any anomaly which is difficult to find just by a sql query/filters. bonus i have been able to create mountain graph which you can see at bottom because of these data itself, i have python notebook for fetching and generating the data, which can be found here\n05 aug 0.81h 06 aug 0.25h 07 aug 3.85h 08 aug 4.42h 09 aug 1.13h 10 aug 3.89h 11 aug 4.19h 12 aug 0.85h 13 aug 0.05h 14 aug 3.04h 15 aug 1.53h 16 aug 1.41h 17 aug 3.79h 18 aug 1.55h in total i spent approx 30.8 hours or more to write this blog post with 14 sessions.\nthank you for reading till the end :) ","date":"2024-08-20","permalink":"https://shashanksharma.xyz/posts/building-live-device-feed/","summary":"Introduction One day, I had a thought, why is it that other organization can track all my stats and not share those raw data with me, even though I am the one who is generating it, obviously few of the services do allow sharing some insights like Netflix watch history or google activities (limited), but it has its own issues like they are not detailed, or some histories are not persistent.\nFor me, It all started with my obsession with generating and tracking data, and also in school days when I used to track my day-to-day activities manually whenever I unlocked my phone, but obviously such manual tracking is far too irritating.","title":"building live device feed for my website"},]
[{"content":"preface the main reason behind this blog is to understand the seriousness of mistakes that were made without any proper arrangements and the impact of such mistakes. in this blog, i\u0026rsquo;ll be covering how the rgpv exam portal was so vulnerable that it exposed around 49 thousands of students\u0026rsquo; private data/submissions publicly and how it could have ruined everyone\u0026rsquo;s exam without any authentication.\nimpact of given vulnerabilities include:\n49,000 students of rgpv student data leak which includes all possible pii data (like: phone number, email, etc) question paper leak submit exam for any other student unlimited image upload to their server lfi exploit which allows you to see file content from their server view all student submission like question answered from final exams which held during 24th aug - 27th aug 2020 introduction after covid-19 pandemic started, rgpv university announced that for final year students, they are planning to organize an exam which will consist of mcq questions (total 40 questions) and it will be an open book exam as mentioned here. this process started by taking two mock exams to test if things are good or not, and then having the final paper from the 24th of august till the 1st of september 2020.\nnow with given information, rgpv decided to create one exam portal for us, which was handled by eduvita. they started registration and creation of portal at rgpvexam.in. how the overall exam process was figured out is by having one unique url for every student assigned, which allows for that particular student to give an exam.\nstep towards testing exam portal during the first day of the exam, there were many issues where it was returning: network error. the full story can be found here which shows that there were technical difficulties and i also faced this issue.\nnow being a computer science engineer, i was already curious about this scenario, and as of my first step, i started with the debugger tool and investigated what was happening. at first, i encountered one debugger state which they initialized in their index.html file, which avoids people to debugging their javascript code. if you are curious, read this here.\non bypassing this, i observed my network tab and found that there was a 504 error returned, as shown in the image.\nnow as of the next step, i was more interested in knowing if their website is vulnerable or not? this curiosity leads me to use the nmap tool, i started one vulnerability check script, and after 5 min, there were two exploits which i found:\nhttp-phpmyadmin-dir-traversal: php file inclusion vulnerability which allows remote attackers to include local files via the redirect parameter, possibly involving the subform array. http-vuln-cve2011-3192: vulnerable to a denial of service attack when numerous overlapping byte ranges are requested. the second vulnerability was more towards ddos attack, but the first vulnerability was shocking, as it was an lfi exploit (local file inclusion). to confirm the hypothesis, i tested it by traversing random directory, and after 8 - 10 attempts, i was able to test it by:\n\u0026lt;base_url\u0026gt;/index.php/?p=../../../\u0026lt;any_file_name\u0026gt;\nimpact: allows anyone to access local files directory, which includes any file like environment keys, logs, or any sensitive data.\nunderstanding tech stack before moving ahead, i started by exploring their tech-stack, and with a little bit of exploration, i found out that they are using:\nfrontend - angular js backend - express web server - nginx database - mongodb with given knowledge, i went ahead observing their api structure and started messing around with it. at first, i send one random post request, and on the response i got:\nthis was my first red flag that things are not right because, in production, it\u0026rsquo;s not a good practice to show error stack. later on, the webserver went down so i had to call it a day.\nthe next day, i repeated the vulnerability test, and i found that lfi exploit is no longer present, which is an improvement. (lfi was there till 24th of august 2020).\non my coming paper 26th august 2020, i was prepared for further testing, i started capturing all the networks request happening and noted down all the endpoints and their content. so for a typical exam in rgpv exam portal, things were like:\ncheck otp: \u0026lt;base_url\u0026gt;/common/student/urlcheck check dob/fathers name: \u0026lt;base_url\u0026gt;/common/student/checkdob upload photograph: \u0026lt;base_url\u0026gt;/common/student/checkexamconfig confirm profile: \u0026lt;base_url\u0026gt;/common/student/confirmprofile waiting phase: \u0026lt;base_url\u0026gt;/common/student/checkexamconfig exam started: \u0026lt;base_url\u0026gt;/common/paper/\u0026lt;exam_code\u0026gt; exam submission: \u0026lt;base_url\u0026gt;/common/paper/\u0026lt;exam_code\u0026gt; problem with all these steps is that except step 1 and 3, each post request to an endpoint needs only enrollment_no as their body to get a response which is worrying because ideally there has to be jwt token used in a header which acts as an authentication.\nimpact given you have any person enrollment_no which exist in their database, anyone can:\n1. question paper leak get full question paper in json format as a response. it includes all question list and answer list.\n2. create/update/delete exam submission as explained above, with the enrollment number, anyone can overwrite the submission of another student easily without any authentication.\n3. upload image without auth as of the third step, which is uploading your photo through webcam. from the client-side, it captures an image and requests to backend endpoint with only file content without any authentication. so, in reality, anyone can upload any image to the given endpoint and spam given aws s3 server with their cat photos.\n4. minor data leak as of the fourth step, it is just a confirmation page which takes enrollment_no and responds with:\nname ip address institute name reverse engineering frontend understanding compiled angular js code is quite painful, but i was pretty much interested in all the endpoints registered in the code. as of the first step, i saved rgpvexam.in page and checked out their js code by making it prettify and finding \u0026ldquo;http.post\u0026rdquo; in code. once i understood their code i found all possible endpoint registered over there:\ncheckurl(t, e = {}, n = 1, i = 10) { return this.http.post(`${this.baseurl}/${t}/urlcheck`, { query: e }) } checkdob(t, e = {}, n = 1, i = 10) { return this.http.post(`${this.baseurl}/${t}/checkdob`, { query: e }) } checkexamconfig(t, e = {}, n = 1, i = 10) { return this.http.post(`${this.baseurl}/${t}/checkexamconfig`, { query: e }) } confirmprofile(t, e = {}, n = 1, i = 10) { return this.http.post(`${this.baseurl}/${t}/confirmprofile`, { query: e }) } getresult(t, e = {}, n = 1, i = 10) { return this.http.post(`${this.baseurl}/${t}/getresult`, { query: e }) } getquestions(t, e) { return this.http.post(`${this.baseurl}/paper/${t}`, e) } updateresult(t, e) { return this.http.post(`${this.baseurl}/paper/updateresult/${t}`, { query: e }) } updateseen(t, e) { return this.http.post(`${this.baseurl}/paper/updateseen/${t}`, { query: e }) } updateanswered(t, e) { return this.http.post(`${this.baseurl}/paper/updateanswered/${t}`, { query: e }) } getdata(t, e = {}, n = 1, i = 10) { return this.http.post(`${this.baseurl}/${t}/urlcheck`, { query: e }) } searchdata(t, e = {}, n = 1, i = 10) { return this.http.post(`${this.baseurl}/${t}/urlcheck`, { query: e }) } createdata(t, e) { return this.http.post(`${this.baseurl}/${t}/urlcheck`, { doc: e }) } updatedata(t, e, n) { return this.http.post(`${this.baseurl}/${t}/urlcheck`, { doc: n }) } deletedata(t, e) { return this.http.post(`${this.baseurl}/${t}/urlcheck`, { id: e }) } maillink(t) { return this.http.post(`${this.baseurl}/${t}/urlcheck`, { enrollment_no: t }) } mailotp(t) { return this.http.post(`${this.baseurl}/${t}/urlcheck`, { enrollment_no: t }) } checkotp(t, e) { return this.http.post(`${this.baseurl}/${t}/urlcheck`, { enrollment_no: t }) } uploadfile(t) { const e = new formdata; return e.append(\u0026#34;file\u0026#34;, t, t.name), this.http.post(dc.url + \u0026#34;/api/file/uploadstudentimage\u0026#34;, e, { reportprogress: !0, observe: \u0026#34;events\u0026#34; }) } uploadface(t, e) { const n = new formdata; return n.append(\u0026#34;file\u0026#34;, e, e.name), this.http.post(dc.url + \u0026#34;/api/file/uploadface/\u0026#34; + t, n, { reportprogress: !0, observe: \u0026#34;events\u0026#34; }) } this confirms that all endpoints were using enrollment_no as their identifier without any authentication.\nbackend once i knew about all the endpoints, it was time to test other endpoints as well which were not mentioned in the frontend code. this is where i started experimenting with random endpoints because there has to be something out there.\none thing which is repetitive in each endpoint was:\n\u0026lt;base_url\u0026gt;/common/\u0026lt;something\u0026gt;\nwhere something is either student or paper, but for testing, i tried some random string like \u0026ldquo;table\u0026rdquo; and the response was a bit shocking as shown in the image.\nhere, \u0026ldquo;table not found\u0026rdquo; means that anything after /common/ will count as table name and it will return all data present for that table.\ni started with multiple strings like \u0026ldquo;exam\u0026rdquo;, \u0026ldquo;user\u0026rdquo; or anything sensitive and then i found multiple table name which was present.\nto show how serious is this, anyone with bad intention might have written a script with common table names and might have spammed a given endpoint to extract every single data from a given table.\nimpact: given vulnerability existed till 28th of august 2020 which includes mock tests and two final exams for all possible branch\n1. /common/student reveals 49 thousand student information present in the database who registered for the rgpv exam portal and this includes every pii which rgpv has (like dob, ip, phone_no, email, etc) including unique id, otp which was meant to be safely stored by each student and much more.\nin general, the privacy of every student was compromised and no one knows how many of them extracted all the given data and might have sold this or using it for marketing purposes.\n2. /common/result reveals every student submission, this submission includes all answers given for the given question and at which time, which question was seen, and more.\nin short, anyone will know which student gave which exam with how much correctness and how much time was taken.\n3. /common/institute all institute with their id and name present in db. this was not much helpful in revealing sensitive data but not good to share.\nconclusion this whole process of organizing the final exam is rushed heavily because of which i found multiple vulnerabilities. i found lfi and data breach in every possible way, our privacy of around 49 thousand students\u0026rsquo; data was compromised by the exam portal. problem is that the given mistake is not reversible, the damage is already done but thanks to rgpv that they realized this later on and fixed this problem.\nwhat i did from my end? as soon as i found out about these exploits, i approached almost all possible contacts:\nrgpv - no reply from \u0026ldquo;rgpvexam2020@rgtu.net\u0026rdquo; nciipc - got reply + sent report aicte - no reply for minor questions, i approached via telegram but no response over there as well.\nwhat\u0026rsquo;s next? as of 28th of august 2020, we were notified in telegram channel that:\nin short, some security updates were made but what exactly was it is still unknown in the given message but i believe they should have at least mentioned as if what sort of mess they created to start with. but there are still a few things which need improvement.\nfrom my end, once i saw the message, i quickly saw all api endpoints again, and finally, those exploits are gone. now getting student\u0026rsquo;s data or any table data is not possible so that is fixed. the whole process of examination is introduced with the jwt token. so once the user enters with a unique url and confirms dob and father\u0026rsquo;s name, then one jwt token is returned and used in the future endpoint. so this is great news for everyone.\nbut now the real question remains, is this the solution? technically yes, but what about the damage which already happened, still data is already leaked, because otp of each user is already leaked so we all are still exposed.\nhaving no authentication mechanism to start with and introducing it, later on, is an improvement, but why are we compromising this in such a larger audience platform? and who will be responsible for this? now this is something which is still unknown\n","date":"2024-06-30","permalink":"https://shashanksharma.xyz/posts/how-vulnerable-was-rgpv-exam-2020/","summary":"Preface The main reason behind this blog is to understand the seriousness of mistakes that were made without any proper arrangements and the impact of such mistakes. In this blog, I\u0026rsquo;ll be covering how the RGPV Exam portal was so vulnerable that it exposed around 49 thousands of students\u0026rsquo; private data/submissions publicly and how it could have ruined everyone\u0026rsquo;s exam without any authentication.\nImpact of given vulnerabilities include:\n49,000 students of RGPV student data leak which includes all possible PII data (like: phone number, email, etc) Question paper leak Submit exam for any other student Unlimited image upload to their server LFI Exploit which allows you to see file content from their server View all student submission like question answered from final exams which held during 24th Aug - 27th Aug 2020 Introduction After Covid-19 pandemic started, RGPV University announced that for final year students, they are planning to organize an exam which will consist of MCQ questions (total 40 questions) and it will be an open book exam as mentioned here.","title":"how vulnerable was rgpv exam 2020"},]
[{"content":"preface testing\n","date":"2024-06-30","permalink":"https://shashanksharma.xyz/microblog/test-micro-blog/","summary":"Preface Testing","title":"test micro blog"},]
[{"content":" introduction python is really a powerful language and with proper use of it anyone can make beautiful things. after studying python i was really impressed by its power and to be more specific i really love how we can scrape any website easily with the help of python. scraping is a process of extracting data from website by their html data. so i learned its basic and started scraping many website.\nrecently i thought of creating something big through scraping but i was having no idea what to do. then i came across with the site of mp transportation and i realized that they got so many data inside there website. the website is very simple, you open the site enter your transport number details and then search it. then you will get result about your transport vehicle which includes type, color etc.\nwith python2.7 i created one script to scrape because with python 3.x there were less support to some modules. i decided to go for \u0026rsquo;last\u0026rsquo; search type because with others i was facing some issues (may be site problem). for this i will have to search each input from 0000 - 9999 in short it makes around 10000 requests. we took 4 digits because it requires min 4 characters to enter. so yeah it was this large.\ni created one program and started scrapping but then with 0000 input and \u0026rsquo;last\u0026rsquo; type search i found that it scraped successfully and i got 1700+ data. but the problem was that it took 5 minutes to scrape 1 request. this happened because of server delay. it was not my problem but it was server\u0026rsquo;s problem to search this much data from database. after realizing this i did some maths.\nif 1 request take = 5 minutes, then, 10000 requests = 50000 minutes = 833.33 hours = 35 days approx = 1 month 4 days\nso in short i need my laptop to run forΒ 1 month and 4 daysΒ to run continuously and trust me it\u0026rsquo;s really a bad idea to do so. but is it worth doing it ?\nif 1 request is giving approx 1000 data 10000 requests = 10,000,000\nso yeah, hypothetically inΒ 35 days i will be able to achieve 10 millions of data. but still being a programmer we must do stuff as fast as possible and to achieve this one thing is sure that i need some power, memory, security etc. i tried multiprocessing and multi threading but it was not working as expected\nso the solution for this problem was getting your hand on some free servers. so i started searching some free website host company which supports python and thought of deploying my script over there. i tried this in pythonanywhere.com and in heroku with the help of flask framework but there was no success. i waited almost 15 days to decide what to do. later i found one site scrapinghub.com which lets you deploy spider on cloud and rest they will take care of that so i went for it and started learning it.\nafter that i learned how to use scrapy and scrapinghub and i created another new program to scrape website with the help of scrapy spiders. source code for this is at the end of this page\nexperiment day 1 - 4,092,328 (4 millions of data in 17 hours) id1 - items - 1,134,421 (15 hours)\nid2 - items - 1,025,282 (17 hours)\nid3 - items - 983,367 (14 hours)\nid4 - items - 949,228 (13 hours)\nsize - 1.3 gb day 2 - 6,498,462 (6.4 millions of data in 17 hours)\n(created 2 more id\u0026rsquo;s to boost my process)\nid1 - items - 1,241,643 (17 hours)\nid2 - items - 1000308 (15 hours)\nid3 - items - 962863 (15 hours)\nid4 - items - 1052844 (15 hours)\nid5 - items - 1144686 (16 hours)\nid6 - items - 1096118 (15 hours)\nsize - 2.4 gb\nfinal result total data collected: 10,590,790 total size: 3.7 gb\ntime consumed: 34 hours\nin just 34 hours by scraping we collected 10 millions of data which was estimated earlier. if we tried to do this process in old fashion like in laptop then it would have taken 1 month so we optimized it.\ndata analysis the main question arises is what to do with data ? which tools to use while analyzing. since the size of our json files are huge. if we will be able to convert json file to database file then it would be really great but doing this will again require loads of time.\nfrom json to database\nwe can do 5 data per second,\nfor 10,000,000 = 2,000,000 seconds = 33333 minutes = 555 hours = 23 days.\nnow that thing is not possible.\ni tried even doing it through sql script which is much better as compare to the previous script but still it will also take approx 20 days.\nso we will use these data in json format, load it into python script and then do our maths over there. loading one file may take approx 10 minutes but time is not an issue. the problem is thatΒ loading json file in python takes so much of memory.Β i mean a lot and since we are working on normal laptop then we need to think of something else. to avoid such problem i used ijson module in python. its really a handy tool which iterates over json data rather than loading it all of sudden. but again with this power we need to sacrifice time a little but still its worth it.\nstats in which state maximum transport is there ?\nindore - 1625663 bhopal - 1023054 jabalpur - 589875 gwalior - 477625 ujjain - 371559 sagar - 272974 chhindwara - 268971 ratlam - 258581 rewa - 242377 dewas - 240930 link:\u0026nbsp;https://plot.ly/~shashank-sharma/19/\nwhich color does people prefer while buying any transport vehicle ?\nblack - 2137200 red - 683663 not specify - 560975 blue - 341134 grey - 288952 white - 283631 silver - 255836 rbk - 238896 p black - 177379 pbk - 168518 link:\u0026nbsp;https://plot.ly/~shashank-sharma/11/\nwhich company have its maximum vehicle in mp ?\nhero honda motors - 2032369 bajaj auto ltd - 1677867 hero moto corp ltd. - 1563023 tvs motor co. ltd. - 1130974 honda mcy \u0026amp; scooter p i ltd - 1102624 mahindra \u0026amp; mahindra ltd - 463175 tata motors ltd - 280684 maruti suzuki india limited - 258392 maruti udyog ltd - 249949 escorts ltd - 139231 link:\u0026nbsp;https://plot.ly/~shashank-sharma/13/\nin which year does maximum vehicle were issued ?\n2016 - 1406802 2014 - 1392520 2015 - 1166079 2013 - 964026 2011 - 845374 2012 - 734092 2010 - 716772 2009 - 607693 2008 - 481315 2007 - 471963 link:\u0026nbsp;https://plot.ly/~shashank-sharma/15/\nwhich transport vehicle is in majority ?\nsplendor plus - 325878 platina - 302537 hf deluxe self cast wheel - 254166 activa (ele auto \u0026amp; kick start) - 216252 tvs star city - 210397 cd dlx - 188885 discover dts - si - 180193 passion pro(drm-slf castwheel) - 163088 activa 3g eas ks cbs bs3 - 162542 passion plus - 146584 link:\u0026nbsp;https://plot.ly/~shashank-sharma/17/\nwhat type of vehicle does people have in majority ?\nmotor cycle - 6531708 scooter - 1291932 motor car - 881930 tractor - 687360 goods truck - 210932 moped - 197450 omni bus for private use - 142478 auto rickshaw passenger - 124051 trolly - 111358 pick up van - 95238 link:\u0026nbsp;https://plot.ly/~shashank-sharma/9/\nand that's how many more questions can be solved with the given data.\nthank you for reading till the end of this page. i hope by now you realized the real power of python.\nsource code:\u0026nbsp;https://github.com/shashank-sharma/mp-transportation-analysis\n","date":"2017-04-16","permalink":"https://shashanksharma.xyz/posts/india-mp-transportation-analysis/","summary":"Introduction Python is really a powerful language and with proper use of it anyone can make beautiful things. After studying Python I was really impressed by its power and to be more specific I really love how we can scrape any website easily with the help of python. Scraping is a process of extracting data from website by their html data. So I learned its basic and started scraping many website.\nRecently I thought of creating something big through scraping but I was having no idea what to do.","title":"india's mp transportation analysis through python"},]
[{"content":"what i do? software engineer at coursera got something to discuss? drop me a mail at: shashank.sharma98@gmail.com","date":"0001-01-01","permalink":"https://shashanksharma.xyz/about/","summary":"What I do? Software Engineer at Coursera Got something to discuss? Drop me a mail at: shashank.","title":"about"},]
[{"content":"π pages python notebooks link microblog link π personal link1 link2 link3 link4 link5 link6 link7 link8 link9 π¨ tools link1 link2 link3 link4 link5 link6 link7 link8 link9 πΊ blog link1 link2 link3 link4 link5 link6 link7 link8 link9 π documentation bookmark item one https://bookmark-item-one.com bookmark item two https://bookmark-item-two.com bookmark item three https://bookmark-item-three.com ","date":"0001-01-01","permalink":"https://shashanksharma.xyz/nav/","summary":"π Pages Python notebooks Link Microblog Link π Personal link1 link2 link3 link4 link5 link6 link7 link8 link9 π¨ Tools link1 link2 link3 link4 link5 link6 link7 link8 link9 πΊ Blog link1 link2 link3 link4 link5 link6 link7 link8 link9 π Documentation bookmark item one https://bookmark-item-one.com bookmark item two https://bookmark-item-two.com bookmark item three https://bookmark-item-three.com ","title":"navigation"},]
β
- ζθΏζ΄ζ° β
π Pages
π Personal
link1 | link2 | link3 | link4 | link5 |
link6 | link7 | link8 | link9 |
π¨ Tools
link1 | link2 | link3 | link4 | link5 |
link6 | link7 | link8 | link9 |
πΊ Blog
link1 | link2 | link3 | link4 | link5 |
link6 | link7 | link8 | link9 |
π Documentation
- bookmark item one https://bookmark-item-one.com
- bookmark item two https://bookmark-item-two.com
- bookmark item three https://bookmark-item-three.com